429 research outputs found
Statistical clustering of temporal networks through a dynamic stochastic block model
Statistical node clustering in discrete time dynamic networks is an emerging
field that raises many challenges. Here, we explore statistical properties and
frequentist inference in a model that combines a stochastic block model (SBM)
for its static part with independent Markov chains for the evolution of the
nodes groups through time. We model binary data as well as weighted dynamic
random graphs (with discrete or continuous edges values). Our approach,
motivated by the importance of controlling for label switching issues across
the different time steps, focuses on detecting groups characterized by a stable
within group connectivity behavior. We study identifiability of the model
parameters, propose an inference procedure based on a variational expectation
maximization algorithm as well as a model selection criterion to select for the
number of groups. We carefully discuss our initialization strategy which plays
an important role in the method and compare our procedure with existing ones on
synthetic datasets. We also illustrate our approach on dynamic contact
networks, one of encounters among high school students and two others on animal
interactions. An implementation of the method is available as a R package
called dynsbm
Spatially-constrained clustering of ecological networks
Spatial ecological networks are widely used to model interactions between
georeferenced biological entities (e.g., populations or communities). The
analysis of such data often leads to a two-step approach where groups
containing similar biological entities are firstly identified and the spatial
information is used afterwards to improve the ecological interpretation. We
develop an integrative approach to retrieve groups of nodes that are
geographically close and ecologically similar. Our model-based
spatially-constrained method embeds the geographical information within a
regularization framework by adding some constraints to the maximum likelihood
estimation of parameters. A simulation study and the analysis of real data
demonstrate that our approach is able to detect complex spatial patterns that
are ecologically meaningful. The model-based framework allows us to consider
external information (e.g., geographic proximities, covariates) in the analysis
of ecological networks and appears to be an appealing alternative to consider
such data
Strategies for online inference of model-based clustering in large and growing networks
In this paper we adapt online estimation strategies to perform model-based
clustering on large networks. Our work focuses on two algorithms, the first
based on the SAEM algorithm, and the second on variational methods. These two
strategies are compared with existing approaches on simulated and real data. We
use the method to decipher the connexion structure of the political websphere
during the US political campaign in 2008. We show that our online EM-based
algorithms offer a good trade-off between precision and speed, when estimating
parameters for mixture distributions in the context of random graphs.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS359 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Nine Quick Tips for Analyzing Network Data
These tips provide a quick and concentrated guide for beginners in the
analysis of network data
Navigating in a sea of repeats in RNA-seq without drowning
The main challenge in de novo assembly of NGS data is certainly to deal with
repeats that are longer than the reads. This is particularly true for RNA- seq
data, since coverage information cannot be used to flag repeated sequences, of
which transposable elements are one of the main examples. Most transcriptome
assemblers are based on de Bruijn graphs and have no clear and explicit model
for repeats in RNA-seq data, relying instead on heuristics to deal with them.
The results of this work are twofold. First, we introduce a formal model for
repre- senting high copy number repeats in RNA-seq data and exploit its
properties for inferring a combinatorial characteristic of repeat-associated
subgraphs. We show that the problem of identifying in a de Bruijn graph a
subgraph with this charac- teristic is NP-complete. In a second step, we show
that in the specific case of a local assembly of alternative splicing (AS)
events, we can implicitly avoid such subgraphs. In particular, we designed and
implemented an algorithm to efficiently identify AS events that are not
included in repeated regions. Finally, we validate our results using synthetic
data. We also give an indication of the usefulness of our method on real data
Core-periphery dynamics in a plant-pollinator network
Mutualistic networks are highly dynamic, characterized by high temporal turnover of species and interactions. Yet, we have a limited understanding of how the internal structure of these networks and the roles species play in them vary through time. We used 6Â years of observation data and a novel statistical method (dynamic stochastic block models) to assess how network structure and species' structural position within the network change throughout subseasons of the flowering season and across years in a quantitative plantâpollinator network from a dryland ecosystem in Argentina. Our analyses revealed a coreâperiphery structure persistent through subseasons and years. Yet, species structural position as core or peripheral was highly dynamic: virtually all species that were at the core in some subseasons were also peripheral in other subseasons, while many other species always remained peripheral. Our results illuminate our understanding of the dynamics of mutualistic networks and have important implications for ecosystem management and conservation.Fil: Miele, Vincent. Centre National de la Recherche Scientifique; FranciaFil: Ramos Jiliberto, Rodrigo. Universidad Mayor; ChileFil: Vazquez, Diego P.. Universidad Nacional de Cuyo. Facultad de Ciencias Exactas y Naturales; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Centro CientĂfico TecnolĂłgico Conicet - Mendoza. Instituto Argentino de Investigaciones de las Zonas Ăridas. Provincia de Mendoza. Instituto Argentino de Investigaciones de las Zonas Ăridas. Universidad Nacional de Cuyo. Instituto Argentino de Investigaciones de las Zonas Ăridas; Argentin
Revealing the hidden structure of dynamic ecological networks
International audienceRecent technological advances and long-term data studies provide interaction data that can be modelled through dynamic networks, i.e a sequence of different snapshots of an evolving ecological network. Most often time is the parameter along which these networks evolve but any other one-dimensional gradient (temperature, altitude, depth, humidity, . . . ) could be considered.Here we propose a statistical tool to analyse the underlying structure of these networks and follow its evolution dynamics (either in time or any other one-dimensional factor). It consists in extracting the main features of these networks and summarise them into a high-level view.We analyse a dynamic animal contact network and a seasonal food web and in both cases we show that our approach allows for the identification of a backbone organisation as well as interesting temporal variations at the individual level.Our method, implemented into the R package dynsbm, can handle the largest ecological datasets and is a versatile and promising tool for ecologists that study dynamic interactions
QuantiFERON-TB gold in-tube implementation for latent tuberculosis diagnosis in a public health clinic: a cost-effectiveness analysis
BACKGROUND: The tuberculin skin test (TST) has limitations for latent tuberculosis infection (LTBI) diagnosis in low-prevalence settings. Previously, all TST-positive individuals referred from the community to Baltimore City Health Department (BCHD) were offered LTBI treatment, after active TB was excluded. In 2010, BCHD introduced adjunctive QuantiFERON-TB Gold In-Tube (QFT-GIT) testing for TST-positive referrals. We evaluated costs and cost-effectiveness of this new diagnostic algorithm. METHODS: A decision-analysis model compared the strategy of treating all TST-positive referrals versus only those with positive results on adjunctive QFT-GIT testing. Costs were collected at BCHD, and Incremental Cost-Effectiveness Ratios (ICERs) were utilized to report on cost-effectiveness. RESULTS: QFT-GIT testing at BCHD cost 1,202 per quality-adjusted life-year gained and was considered highly cost-effective. In sensitivity analysis, the QFT-GIT strategy became cost-saving if QFT-GIT sensitivity increased above 92% or if less than 3.5% of individuals with LTBI progress to active TB disease. CONCLUSIONS: LTBI screening with TST in low-prevalence settings may lead to overtreatment and increased expenditures. In this public health clinic, additional QFT-GIT testing of individuals referred for a positive TST was cost-effective
Deep learning for species identification of modern and fossil rodent molars
Reliable identification of species is a key step to assess biodiversity. In fossil and archaeological contexts, genetic identifications remain often difficult or even impossible and morphological criteria are the only window on past biodiversity. Methods of numerical taxonomy based on geometric morphometric provide reliable identifications at the specific and even intraspecific levels, but they remain relatively time consuming and require expertise on the group under study. Here, we explore an alternative based on computer vision and machine learning. The identification of three rodent species based on pictures of their molar tooth row constituted the case study. We focused on the first upper molar in order to transfer the model elaborated on modern, genetically identified specimens to isolated fossil teeth. A pipeline based on deep neural network automatically cropped the first molar from the pictures, and returned a prediction regarding species identification. The deep-learning approach performed equally good as geometric morphometrics and, provided an extensive reference dataset including fossil teeth, it was able to successfully identify teeth from an archaeological deposit that was not included in the training dataset. This is a proof-of-concept that such methods could allow fast and reliable identification of extensive amounts of fossil remains, often left unstudied in archaeological deposits for lack of time and expertise. Deep-learning methods may thus allow new insights on the biodiversity dynamics across the last 10.000 years, including the role of humans in extinction or recent evolution
Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads
International audienceAbstractBackground The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them.ResultsThe results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99â111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644â652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086â1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when compared to other transcriptome evaluation methods is that we use only the topology of the DBG, and not read nor coverage information. We show that our simple method gives better results than Rsem-Eval (Li et al. in Genome Biol 15(12):553, 4) and TransRate (Smith-Unna et al. in Genome Res 26(8):1134â1144, 5) on both real and simulated datasets for detecting chimeras, and therefore is able to capture assembly errors missed by these methods
- âŠ